Skip to content

Overriding profile endpoint with analyze endpoint with operator tree and profiling#5568

Open
Krish-Gandhi wants to merge 2 commits into
opensearch-project:mainfrom
Krish-Gandhi:feature/analyze-endpoint
Open

Overriding profile endpoint with analyze endpoint with operator tree and profiling#5568
Krish-Gandhi wants to merge 2 commits into
opensearch-project:mainfrom
Krish-Gandhi:feature/analyze-endpoint

Conversation

@Krish-Gandhi

@Krish-Gandhi Krish-Gandhi commented Jun 19, 2026

Copy link
Copy Markdown

Description

  • Introduces the analyze endpoint for PPL queries, which can be activated by passing "analyze": true as a request body parameter.
  • Overrides the existing profile endpoint to run analyze functionality.
  • Propagate PPLanalyze flag through transport and request parsing.
  • Combines logical plan and physical plan nodes by walking each tree and grouping nodes into an operator_tree.
  • Maps PPL query segments and estimated/actual rows to operator_tree nodes.
  • Combined current profile plan response with operator_tree nodes to calculate time taken by each node.

Important

This PR overrides the existing profile endpoint by routing all requests with either "analyze": true or "profile": true (or both) to pplService.analyze() in TransportPPLQueryAction.java. For current end-users of the profile endpoint, this will make little difference, as the response from profile is a subset of the response from analyze. In simpler terms, current end-users can make no changes and have similar results.

All of the existing code for profile still exists and is in place. This makes it extremely simple to separate analyze and profile again. This can be done by removing the else block in lines 196-199 of TransportPPLQueryAction.java and replacing it with the commented else if and else blocks in lines 203-211.

plugin/src/main/java/org/opensearch/sql/plugin/transport/TransportPPLQueryAction.java:193-215:

    if (transformedRequest.isExplainRequest()) {
      pplService.explain(
          transformedRequest, createExplainResponseListener(transformedRequest, clearingListener));
    } else {
      pplService.analyze(
        transformedRequest, createAnalyzeResponseListener(transformedRequest, clearingListener));
    }
    /**
     * Commenting out lines 196-199 and replacing them with lines 203-211 will
     * separate the `profile` and `analyze` endpoints. See PR #5568.
     */
    // } else if (transformedRequest.analyze()) {
    //   pplService.analyze(
    //     transformedRequest, createAnalyzeResponseListener(transformedRequest, clearingListener));
    // } else {
    //   pplService.execute(
    //       transformedRequest,
    //       createListener(transformedRequest, clearingListener),
    //       createExplainResponseListener(transformedRequest, clearingListener));
    // }
  }

Example Query and Response

The following curl command will run the query source=accounts | where age < 30 | eval full_name = firstname + \" \" + lastname | fields full_name, email, age on the analyze endpoint:

curl -X POST "localhost:9200/_plugins/_ppl" \
  -H "Content-Type: application/json" \
  -d '{"query": "source=accounts | where age < 30 | eval full_name = firstname + \" \" + lastname | fields full_name, email, age", "analyze": true}'

The response of this will be as follows. (NOTE: The "logicalPlan" and "physicalPlan" fields are included for debugging purposes and should not be included in the final version of this endpoint.)

{
  "query": "source=accounts | where age < 30 | eval full_name = firstname + \" \" + lastname | fields full_name, email, age",
  "logicalPlan": [
    "LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT]): rowcount = 5000.0, cumulative cost = {114000.0 rows, 145000.0 cpu, 0.0 io}, id = 14855",
    "LogicalProject(full_name=[||(||($0, ' '), $4)], email=[$3], age=[$2]): rowcount = 5000.0, cumulative cost = {109000.0 rows, 25000.0 cpu, 0.0 io}, id = 14854",
    "LogicalFilter(condition=[<($2, 30)]): rowcount = 5000.0, cumulative cost = {104000.0 rows, 10000.0 cpu, 0.0 io}, id = 14852",
    "CalciteLogicalIndexScan(table=[[OpenSearch, accounts]]): rowcount = 10000.0, cumulative cost = {99000.0 rows, 0.0 cpu, 0.0 io}, id = 14851"
  ],
  "physicalPlan": [
    "EnumerableCalc: rowcount = 1.0, cumulative cost = {2.0 rows, 2.0 cpu, 0.0 io}, id = 14948",
    "CalciteEnumerableIndexScan: rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io}, id = 14947"
  ],
  "profile": {
    "summary": {
      "total_time_ms": 761.06
    },
    "phases": {
      "analyze": {
        "time_ms": 1.6
      },
      "optimize": {
        "time_ms": 6.76
      },
      "execute": {
        "time_ms": 752.48
      },
      "format": {
        "time_ms": 0.18
      }
    },
    "plan": {
      "node": "EnumerableCalc",
      "time_ms": 751.68,
      "rows": 3,
      "children": [
        {
          "node": "CalciteEnumerableIndexScan",
          "time_ms": 751.51,
          "rows": 3
        }
      ]
    }
  },
  "operator_tree": [
    {
      "source": "source=accounts | where age < 30",
      "node_type": [
        "SearchFrom",
        "WhereCommand"
      ],
      "description": [
        "CalciteLogicalIndexScan(table=[[OpenSearch, accounts]]): rowcount = 10000.0, cumulative cost = {99000.0 rows, 0.0 cpu, 0.0 io}, id = 14851",
        "LogicalFilter(condition=[<($2, 30)]): rowcount = 5000.0, cumulative cost = {104000.0 rows, 10000.0 cpu, 0.0 io}, id = 14852"
      ],
      "estimated_rows": 5000,
      "actual_time_ms": "751.51 ms",
      "actual_rows": 3,
      "is_pushed_down": true
    },
    {
      "source": "eval full_name = firstname + \" \" + lastname | fields full_name, email, age",
      "node_type": [
        "EvalCommand",
        "FieldsCommand"
      ],
      "description": [
        "LogicalProject(full_name=[||(||($0, ' '), $4)], email=[$3], age=[$2]): rowcount = 5000.0, cumulative cost = {109000.0 rows, 25000.0 cpu, 0.0 io}, id = 14854"
      ],
      "estimated_rows": 5000,
      "actual_time_ms": "0.17 ms",
      "actual_rows": 3
    }
  ],
  "recommendations": [],
  "schema": [
    {
      "name": "full_name",
      "type": "VARCHAR"
    },
    {
      "name": "email",
      "type": "VARCHAR"
    },
    {
      "name": "age",
      "type": "BIGINT"
    }
  ],
  "datarows": [
    {
      "full_name": "Jane Smith",
      "email": "jane@example.com",
      "age": 28
    },
    {
      "full_name": "Kyle Miller",
      "email": "goat@example.com",
      "age": 22
    },
    {
      "full_name": "Joette Kap",
      "email": "coast2coast@example.com",
      "age": 22
    }
  ],
  "total": 3,
  "size": 3
}

Performance of analyze

After writing a benchmarking script to run a query on a sample dataset, the results were as follows:

=== PPL Query Benchmark ===
Query: source=opensearch_dashboards_sample_data_flights | where DistanceMiles > 1000 | stats avg(FlightTimeMin) as avg_time, count() as cnt by DestCountry
  | sort - cnt | head 10
Iterations: 100

--- Results (after 3 warmup runs each) ---

normal      avg=1.3ms  p50=1.2ms  p95=2.0ms  min=0.9ms  max=2.3ms
profile     avg=1.6ms  p50=1.6ms  p95=2.1ms  min=1.0ms  max=2.4ms
analyze     avg=1.6ms  p50=1.6ms  p95=2.1ms  min=0.9ms  max=3.1ms

Related Issues

#5500
Resolves #4343

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…and profiling

Signed-off-by: Krish Gandhi <kjg2352@gmail.com>
Signed-off-by: Krish Gandhi <kjg2352@gmail.com>
@Krish-Gandhi Krish-Gandhi marked this pull request as ready for review June 19, 2026 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Support analyze alongside explain

1 participant